clustering of short read sequences for de novo transcriptome assembly

نویسندگان

samaneh saadat

zhaleh safikhani

kambiz badie

mehdi sadeghi

چکیده

given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. in this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. first, the contiguous sequencesare generated using de bruijn graph with different k-mer lengths. then, the eclectic mixtures ofsequences are gathered in order to form the final sequences. lastly, the contiguous sequencesare clustered and the isoform groups are provided. this proposed algorithm is capable ofgenerating long contiguous sequences and accurately clustering them into isoform groups.toevaluate our algorithm, we applied it to a simulated rna-seq dataset of rat transcriptome and areal rna-seq experiment of the loricaria gr. cataphracta transcriptome. the correctness of theassembled contigs was more than 95%, and our algorithm was able to reconstruct over 70% ofthe transcripts at more than 80% of the transcripts’ lengths. this study demonstrates thatapplying a sophisticated merging method improves transcriptome assembly. the source code isavailable upon request by contacting the corresponding author by email.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Optimization of De Novo Short Read Assembly of Seabuckthorn (Hippophae rhamnoides L.) Transcriptome

Seabuckthorn (Hippophaerhamnoides L.) is known for its medicinal, nutritional and environmental importance since ancient times. However, very limited efforts have been made to characterize the genome and transcriptome of this wonder plant. Here, we report the use of next generation massive parallel sequencing technology (Illumina platform) and de novo assembly to gain a comprehensive view of th...

متن کامل

The Impacts of Read Length and Transcriptome Complexity for De Novo Assembly: A Simulation Study

Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, th...

متن کامل

Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

We have developed a new set of algorithms, collectively called "Velvet," to manipulate de Bruijn graphs for genomic sequence assembly. A de Bruijn graph is a compact representation based on short words (k-mers) that is ideal for high coverage, very short read (25-50 bp) data sets. Applying Velvet to very short reads and paired-ends information only, one can produce contigs of significant length...

متن کامل

De novo transcriptome assembly with ABySS

MOTIVATION Whole transcriptome shotgun sequencing data from non-normalized samples offer unique opportunities to study the metabolic states of organisms. One can deduce gene expression levels using sequence coverage as a surrogate, identify coding changes or discover novel isoforms or transcripts. Especially for discovery of novel events, de novo assembly of transcriptomes is desirable. RESUL...

متن کامل

De novo assembly of short sequence reads

A new generation of sequencing technologies is revolutionizing molecular biology. Illumina's Solexa and Applied Biosystems' SOLiD generate gigabases of nucleotide sequence per week. However, a perceived limitation of these ultra-high-throughput technologies is their short read-lengths. De novo assembly of sequence reads generated by classical Sanger capillary sequencing is a mature field of res...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید


عنوان ژورنال:
progress in biological sciences

ناشر: university of tehran

ISSN 1016-1058

دوره 4

شماره 1 2014

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023